DOCUMENT RESUME 



ED 263 171 



TM 850 661 



AUTHOR 
TITLE 

INSTITUTION 
SPONS AGENCY 



PUB DATE 
CONTRACT 
NOTE 

AVAILABLE FROM 



PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



Weiss, David J. 

Computerized Adaptive Measurement of Achievement and 
Ability. Final Report. 

Minnesota Univ., Minneapolis. Dept. of Psychology* 
Air Force Human Resources Lab., Brooks AFB, Texas.; 
Air Force Office of Scientific Research, Arlington, 
Va.; Army Research Inst, for the Behavioral and 
Social Sciences, Alexandria, Va.; Office of Naval 
Research, Arlington, Va. Personnel and Training 
Research Programs Office. 
Jun 85 

N00014-79-C-0172 
31p. 

Computerized Adaptive Testing Laboratory, N660 
Elliott Hall, University of Minnesota, 75 East River 
Road, Minneapolis, MN., 55455. 
Reports - Research/Technical (143) — Reference 
Materials - Bibliographies (131) 

MF01/PC02 Plus Postage. 

♦Ability; *Achievement Tests; *Adaptive Testing; 
♦Computer Assisted Testing; Higher Education; Item 
Analysis; *Latent Trait Theory; Mastery Tests; Montn 
Carlo Methods; Testing; Test Items; Test Theory 



ABSTRACT 

This research program was designed to investigate the 
applications of item response theory and computerized adaptive 
testing to the unique problems of the measurement of ability and the 
measurement of achievement. The research utilized a combination of 
monte carlo simulatio. studies and live-testing studies. The research 
approach for adaptive achievement testing included: intersubtest 
branching; the dimensionality of measured achievement over time; 
adaptive mastery te;;ting; and adaptive self-referenced testing. The 
research approach for adaptive ability testing included: adaptive 
testing strategies; and response modes, test item formats and effects 
of test administration variables. ThiL research is stimmarized and 
related to ten technical reports and other publications. Abstracts of 
the technical reports are included and the major research findings 
are presented. (PN) 



* Reproductions supplied by EDRS are the best thi can be made * 

* from the original document. * 

**************** 



ERIC 



Final Report 

Computerized Adaptive Measurement 
of Acliievement and Ability 



David J. Weiss 



BEST COPY AVAILABU 



June 1985 



U A DCf AIITMCf^ Of lOUCATION 

NATIONAL INSTrrUTf OF EDUCATION 
EDUCATIONAL RESOURCES INFORMATION 
CENTER tERlCJ 
^9 Thk documenl has bi'tn rtproducad at 
ractfvfd from tha pa'^oo of organUatJon 
ongir^attng H. 
0 Mmor char>g«t hava baan mada to tmprov* 
rapfoduciion quality. 

• Points of \^«w o( opinions stalad m this docu- 
mani do not nacauanty raprasant official NtE 
position « policy. 



Computerized Adaptive Testing Laboratory 
Deparment of Psycho! ogy 
University of Minnesota 
Minneapolis MN 55455 



Final Report of Project NR150-433. N00014-79-C-0172 

Supported by the 
Office of Naval Research 
Air Force Human Resources Laboratory 
Air Force Office of Scientific Research 
Army Research Institute 



approved for pus.ic release; 
reproduction in whole or in 
any purpose of the united 



DISTRIBUTION UNLIMITED 
PART IS PERMITTED FOR 
STATES 60VERNM/:NT. 



^ECU.MTY CLASSlFICATiON OF THIS PAGE <^'h»n D»t» Ent9fd) 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1 REPORT NUMBER 


2, GOVT ACCESSION NO. 


3. RECIPIENT'S CATALOG NUMBER 


4. TITLE (Md SubUUe) 

Final Report: 

ComputeriEed Adaptive Measurement 
of Achievement and Ability 


5. TYPE OF REPORT * PERIOD COVERED 

Final Report 

February 1979 - April 1983 


C* PERFORMING ORG. REPORT NUMBER 


7. AUTMORr*; 

David J. Weiss 


CONTRACT OR GRANT NUMBERft; 

N00014-79-C-0172 


9 PERFORMING ORGANIZATION NAME AND ADDRESS 

Department of Psychology 
University of Minnesota 
Minneapolis, MN 55455 


10. PROGRAM ELEMENT. PROJECT, TASK 
AREA 4 WORK UNIT NUMBERS 

PE: 615534 Proi: RR042-04 
TA' RR042-04-01 WU* NRlSn-4'?'? 


i; CONTROLLING OFFICE NAME AND ADDRESS 

Personnel and Training Research Programs 
Office of Naval Research 
Arlington^ VA 22217 


12. REPORT DATE 

June 1985 


13. NUMBER OF PAGES 

20 


M MCNi^wRiNO AGENCY nAmE. A AODRESSflf ditttttnt from Controiitng Olitct) 


15. SECURITY CLASS, (ol IhU fport) 

Unclassified 


13a. DECLASSIFICATION/DOWNGRAQING 
SCHEDULE 



16^ DISTRIBUTION STATEMENT (ot ihit Rtport) 



Approval for public release; distribution unlimited. Reproduction in whole 
or in part is permitted for anv purpose of the United States Government. 



n. DISTRIBUTION STATEMENT (ot tht abtieact •ntmrtd In Btock tkO, It dltUtani irom Rtpori) 



18. SUPPLEMENTARY NOTES 

This research was supported by funds from the Office of Naval Research, Air 
Force Human Resources Laboratory, Air Force Office of Scientific Research, 
and Army Research Institute, and monitored bv the Office of Naval Research. 



19. KEY WORDS (Continu* on r*v*r«tt §ld§ It ntcattmry and Idtntlly by btock nusnbtr) 



20. ABSTRACT (Continu* on r»v§r§§ §ld§ It n«c««««ry and Identity by btock nustb§r) 

The research pro^'ram's objectives are described, and the research approach 
is summarized and related to the ten technical reports and other project 
publications. Thirteen major research findings are presented. Abstracts 
of the ten technical reports are also Included. 



DD , j2n 73 1473 EDITION OF 1 NOV 6S IS OBSOLETE Unclassified 
s/N 0102.LF-0U.6601 unciassxtied 



SECURITY CLASSIFICATION OF THIS PAGE (WhM DmU UnUfd) 



CONTENTS 



Objectives 

Adaptive Achivement Testing 
Adaptive Ability Testing 



Approach •••••••••••••••• • 

Adaptive Achivement Testing 

Intersubtest Branching ••••••••••• •••• 

The Dimensionality of Measured Achivement Over Time 2 

Adaptive Mastery Testing 3 

Adaptive Self-Referenced Testing % 3 

Adaptive Ability Testing • 4 

Adaptive Testing Strategies •* A 

Response Modes, Test Item Formats, and Effects of Test 

Administration Variables ^...o 6 

Major Findings ♦ 7 

Adaptive Achivement Testing « , 7 

Adaptive Ability Testing 8 

Abstracts of Research Reports * »..».o 11 

79- 6. Efficiency of an Adaptive Inter-Subtest Branching Strategy 

in the Measurement of Classroom Achivement 11 

80- A. A Comparison of Adaptive, Sequential, and Conventional 



Testing Strategies for Mastery Decisions ...^ * 12 



80- 5. An Alternate-Forms Reliability and Concurrent Validity 

Comparison of Bayesian Adaptive and Conventional Ability 
Tests ^...^ 12 

81- 1. Review of Test Theory and Methods * 13 

81-2. Effects of Immediate Feedback and Pacing of Item 

Presentation on Ability Test Performance and Psychological 

Reactions to Testing • » 14 

81-3. A Validity Comparison of Adaptive and Conventional 

Strategies for Mastery Testing , 15 

81-4. Factors Influencing the Psychometric Characteristics of an 

Adaptive Testing Strategy for Test Batteries 16 

81-5. Dimensionality of Measured Achievement G er Time 16 

83-2. Bias and Information of Bayesian Adaptive Testing 17 

83-3. Effect of Examinee Certainty on Probabilistic Test Scores 
and a Comparison of Scoring Methods for Probabilistic 
Responses * 17 

References • • . . • • . . • * ♦ • . . 19 



4 



final report 

Computerized Adaptive Measurement of Achievement and Ability 



Objectives 

This research program was designed to Investigate the applications of Item 
response theory (IRT) and computerized adaptive testing to the unique problems 
of the measurement of ability and the measurement of achievement. Specific ob- 
jectives relevant to these two areas were as follows: 

Adaptive Achievement Testing 

!• To Gcudy the relative efficiency of various approaches to intersubtest 
br^^nching in achievement test batteries. 

2. To investigate the dimensionality of measured achievement over time. 

3. To study the applicability of IRT models to the problem of mastery testing 
and to compare models for adaptive mastery testing with other approaches to 
the improvement of mastery decisions and/or reduction in test length in mas- 
tery testing. 

4. To explicate the concept of Adaptive Self-Referenced Testing and to examine 
its applicability to the achievement testing problem. 

Adaptive Ability Testing 

5. To evaluate the performance of adaptive testing strategies under conditions 
which more reasonably represent the conditions under which these strategies 
might be used, and to examine the performance of adaptive testing strategies 
in live testing. 

6. To evaluate the utility for adaptive tasting of response modes and test item 
formats usable in adaptive ability testing. 

Research in pursuance of these objectives began in February 1979 and continued 
through April 1983. 

Approach 

The research utilized a combination of monte carlo simulation studies and 
live-testing studies. 

Adaptive Achievement Testing 

Intersubtest branching . Intersubtest branching is an approach to the uti- 
lization of adaptive testing methodologies in a multidimensional item pool. In 
intersubtest branching, IRT item parameters are estimated separately for each 
subtest of a multisubtest battery. Using any of a number of adaptive testing 
strategies, adaptive testing occurs within the subtest based on appropriate item 
selection rules and a test termination criterion appropriate for the purpose of 



- 2 - 



testing. Upon completion of a subtest in the test battery, the final trait lev- 
el estimate (g) is then used as an entry point to begin testing in a subsequent 
subtest in the battery. As originally proposed, subtests in a battery are or- 
dered by the magnitudes of the squared multiple correlations of each subtest 
with all other subtests in the battery. In this way, the entry points for adap- 
tive testing in each subtest utilize the information available in the tests in 
the test battery that were most highly correlated with it, which should shorten 
thu adaptive tests for later subtests fn the battery as much as possible. 

Intersubtest adaptive branching was studied by real-data simulation in Re- 
search Report 79-6, and by monte carlo simulation in Research Report 80-4. The 
study reported in Research Report 79-6 used data from conventionally-administer- 
ed tests which were analyzed as if they had been administered as an adaptive 
test, and the intersubtest branching strategy was applied to these data. This 
study was designed to separate the effects of the adaptive intrasubtest item 
selection procedure from those effects due to intersubtest branching. The study 
also (1) allowed evaluation of the effects of different intrasubtest termination 
criteria, (2) investigated the effect of taking into account errors of measure- 
ment in the multiple regression procedure used to determine test entry points, 
and (3) investigated the stability of the regression equations in cross-valida- 
tion. 

Other aspects of the intersubtest branching strategy when applied to an 
achievement test battery were investigated by monte carlo simulation in Research 
Report 81-A. Questions of interest in this study included (1) the effects of 
varying subtest order, (2) the utilization of different subtest termination cri- 
teria, and (3) the effect of variable versa fixed entry on the psychometric 
properties of the intersubtest branching strategy. Dependent variables included 
(I) reductions in test length, (2) effect on test information, and (3) correla- 
tions between achievement estimates and true achievement levels. The study de- 
sign also permitted separation of the effects of intrasubtest and intersubtest 
adaptive branching. 

The dimensionality of measured achievement over time . The effects of in- 
struction on measured achievement are usually measured at a single point in 
time. That is, some instruction is given to an individual and at the end of the 
period of instruction an achievement test is used to determine whether the indi- 
vidual has reached an appropriate level of achievement. On the basis of such 
information, aggregated across individuals, decisions are frequently made about 
the adequacy of instructional programs, or about the impact (or lack thereof) of 
instruction on a specific individual. 

A more powerful approach to the measurement of achievement would involve 
the use of pretests and posttests to determine if any change has occurred in 
measured achievement over time. Using change scores, however, implies that the 
variable bein'- measured is the same at pretest as it is at posttest. There has 
been very lit • empirical da^ ; available concerning this issue. 

Research ...port 81-5 was designed to investigate the question 6f whether 
the achievement factor identified at pretest in an achievement test is the same 
factor identified at posttest. Two studies utilized data on groups of college 
students from measured achievement in mathematics classes and biology classes. 



ERIC 

hiiiinniimrrTiaaia 



- 3 - 



Achievement test item responses were factor analyzed prior to instruction, and 
again at the end of instruction. In addition, mean differences in test scores 
at pretest and pcsttest were analyzed. Factors obtained at pretest were com- 
pared with those obtained at posttest to determine if the same factor was found 
prior to and after instruction. 

Kingsbury (198A) directly examined the characteristics of change scores 
derived from adaptive and conventional tests. This study utilized data from 
college-level biology examinations. Both adaptive and conventional tests were 
administered in a complex design to groups of students in such a way that relia- 
bilities of the change scores could be determined separately for the two types 
of tests in a number of different homogeneous content areas covered In the 
course. The question raised by this study was based on hypotheses that the more 
precise achievement level estimates resulting from adaptive testing should also 
result in more reliable change scores in comparison to those from conventional 
testing. Also studied was the effect of variable- versus fixed-length test ter- 
mination on the adaptive tests. 

Adaptive Mastery Testing . Adaptive mastery testing (AMI) combines IRT and 
adaptive testing into an efficient strategy for making mastery or classification 
decisions. In this procedure, items used to make a mastery decision are select- 
ed by an IRT maximum inforrjation adaptive testing strategy. Item responses are 
scored using a Bayesian 6 estimation procedure, and a confidence or credibility 
interval is computed for the 6 estimate. The confidence interval around the 
estimate is then compared to a mastery cuto*"*. score, which is also expressed on 
the 0 metric. A mastery decision is determined on the basis of whether the 
credibility interval overlaps with the mastery criterion level, and on which 
side of the mastery cutoff score the individual's 9 estimate falls. 

Both monte carlo simulation and live testing were used to investigate char- 
acteristics of the AMT strategy and to compare it with other approaches for mak- 
ing mastery decisions. In Research Report 80-4 (also Kingsbury & Weiss, 1980a) 
the AMT procedure was compared to a conventionally-based mastery testing proce- 
dure and to a procedure based on Wald's sequential probability ratio test. The 
procedures were compared in terms of their efficiency, based on the test length 
re^uired by the procedures to make a classification decision, on the validity of 
the decisions made by each procedure, and on the type of classifications made by 
each of the three testing procedures. 

To examine the generality of the findings in live testing, in Research Re- 
port 81-3 the AMT procedure and a conventional test were administered to stu- 
dents in a biology class. Contrary to earlier studies which examined the AMT 
procedure, actual adaptive mastery tests were administered to one subgroup of 
students while the other received computer-administered conventional tests. The 
performance of the two testing strategies was evaluated in terms of a mastery 
criterion based on the students* final standing in the course, which was a com- 
bination of their performance on course examinations and laboratory grades. 

Adaptive Self-Referenced Testing . Adaptive self-referenced testing (ASRT) 
is a combination of IRT and adaptive testing designed to permit the efficient 
measurement of changes in achievement levels due to exposure to instruction. 
This procedure is designed to measure individual changes in achievement in a 



Er|c 7 

MffliffllfflffTll^iU I 



^ 4 " 



unidimensional item pool in a very efficient manner at a number of points of 
instruction. It is thus an appropriate conceptualization for tracking individu- 
al changes due to instruction at a number of points during a course, since it 
permits an instructor to evaluate an individual's performance on a minimum num- 
ber of items at each of a number of testing occasions. 

ASRT permits an instructor to measure a student early in a course, such as 
on the first day, and as frequently as is necessary during the course. Based on 
adaptive testing methodology, the data obtained from the Time 1 testing are used 
as the entry point to Time 2 adaptive Lest administration » and this process is 
followed for any number of test administrations. In addition, test termination 
at any point in time can be based on the standard error band associated with an 
individual's 8 estimate at that point in time. ASRT is designed to simultane- 
ously permit intraindividual measurement of change, norm-based measurement on 
the 9 metric which can then be converted to the proportion-correct measurement 
if desired, and a mastery-based (criterion-referenced) achievement level esti- 
mate utilizing the procedures of AMT. While no research directly related to 
ASRT was done during the contract period, the method was described in some de- 
tail In Weiss & Kingsbury (1984). Both Research Report 81-5 and the Kingsbury 
(198A) study have implications for the use of ASRT and its future development. 

Adaptive Ability Testing 

Adaptive testing strategies . A major focus of this research program was on 
the evaluation of different approaches to computerized adaptive testing. While 
earlier projects were concerned primarily with evaluating the relative perfor- 
mance of adaptive and conventional testing strategies, in this project the focus 
was on the iRT-based strategies and on their performance under a variety of con- 
ditions. An overview of some aspects of project research is given by Weiss 
(1982). 

The performance of a Bayesian adaptive testing strategy was repc^rted in 
Research Report 83-2 (also Weiss & McBride, 1984). Owen's Bayesian adaptive 
testing strategy was examined in three studies which utilized an accurate prior 
^ estimate, a constant prior e estimate with fixed test length, and a constant 
prior e estimate with variable test length. The performance of the adaptive 
testing strategy was examined in terms of the bias and information of the 9 es- 
timates as a function of 9. Also examined was the mean number of items adminis- 
tered in the variable test length condition. 

A major concern of the research was to evaluate the performance of adaptive 
testing strategies under conditions of increasing realisticness. Prior to these 
studies, all studies evaluating the performance of adaptive testing strategies 
did so under reasonably unrealistic conditions. While characteristics of the 
item pools varied in these earlier studies, the IRT item parameters used in 
these simulation studies were considered to be accurate. However, in real item 
pools, there is always some error associated with the item parameter estimates. 
Since adaptive testing is designed to select items on the basis of these item 
parameter estimates, it can be assumed that any degree of inaccuracy in the item 
parameter estimates will have detrimental effects on the performance of adaptive 
testing strategies. 



ERIC 



8 



Consequently, two studies were designed to investigate effects of errors in 
item parameter estimates on the performance: of maximum information and Bayesian 
adaptive testing strategies. The first study (Crichton, 1981) assessed the 
effects of errors in item parameter estimates in the context of the three-param- 
eter logistic model. Crichton compared the performance of the two IRT-based 
adaptive testing strategies — maximum information and Bayesian — with the strati- 
fied adaptive (stradaptive) strategy, on the hypothesis that the stradaptive 
strategy should be less sensitive to errors in the item parameter estimates. 
Her raonte carlo simulation study varied test length from 5 to 30 items. Test 
length was then crossed with three levels of error in the discrimination (a^) 
parameter, four levels of error in estimates for the difficulty (Jb) parameter, 
and two levels of error in the pseudo-guessing (c) parameter. In addition to 
considering these effects for b^, and c^ separately, two datasets examined the 
effects of joint errors in the a^, b^, and c^ parameters. Dependent variables con- 
ditional on 8 included the bias, root mean square error, inaccuracy, and infor- 
mation in the 9 estimates, and the correlation of 6 and @. 

Mattson (1983) also examined the performance of adaptive testing strategies 
under conditions of error in item parameter estimates, using monte carlo simula- 
tion. Mattson extended the Crichton study by studying similar effects in the 
one- and two-parameter logistic models, in addition to the three-parameter mod- 
el. Whereas Crichton limited her trait level estimation to maximum likelihood 
scoring of the response vectors, Mattson also included Bayesian scoring of the 
maximum information and Bayesian adaptive tests. In addition, Mattson allowed 
the level of correlation between the a^ and b^ param-^ters to vary at four differ- 
ent levels, as well as examining the uncorrelated condition used by Crichton* 
Similar to Crichton, Mattson also varied test length from 10 to 30 items. Fi- 
nally, Mattson allowed errors in the parameter to </dry at two levels, examined 
four levels of error in b^, and one level of error in c^. All conditions were 
crossed with each other. Mattson's dependent variables were the same as those 
studied by Crichton. 

A second factor that can affect the performance of adaptive testing strate- 
gies in a realistic item pool is the dimensionality of the item pool. Since all 
IRT models assume a unidimensional item pool, deviations from unidimensionality 
would be expected to affect the performance of adaptive testing strategies in 
real item pools, which are rarely (if ever) strictly unidimensional. As a re- 
sult Suhadoinik and Weiss (1985) examined the robustness of adaptive testing to 
multldlmensionality. 

In this study, the maximum information adaptive testing strategy using max- 
imum liklihood scoring was applied to datasets varying from strictly unidimen- 
sional to four-factor datasets that reflected the structure of the most multidi- 
mensional subtest of the Armed Service Vocational Aptitude Battery. Between 
these extremes were two- and three-factor datasets in which the second and third 
factors accounted for varying proportions of variance in comparison to the first 
factor, thus simulating item structures varying from very little multldlmension- 
ality, to a very high degree of multldlmensionality. A total of 45 data struc- 
tures was examined. 

To evaluate the effects of multldlmensionality, dichotomous item responses 
were simulated from the specified multidimensional structures. These item re- 



- 6 - 



sponses were then treated as If they were derived from a unldlmensonal model, 
and adaptive testing was implemented using the item response vectors. To evalu- 
ate the performance of the maximum information adaptive testing strategy under 
multidimensionality, the conditional bias, inaccuracy, and root mean square er- 
ror of the 9 estimates was computed relative to the true first factor 6 from the 
multidimensional structure. 

Response mo des, test item formats, and effects of test administration vari- 
ables. The administration of ability tests by interactive computers allows the 
use of item types that transcend the typical dicho'tomously-scored multiple- 
choice test item. Re^5earch Report 83-3 examined aspects of a probabilistic re- 
sponse mode used in conjunction with the typical multiple-choice item format. 
This response mode was chosen as one means of extracting additional information 
from a multiple-choice item, rather than simply requiring a choice of a single 
response alternative. 

A major problem with probabilistic responding to multiple-choice items in 
conventional paper-and-pencil test administration is that examinees do not al- 
ways follow the instructions carefully so that the probabilities they assign to 
the item responses does not always sum to I. 00. As a consequence, large amounts 
of data might be lost for a given examinee. When multiple-choice items are an- 
swered in a probabilistic mode on a computer terminal, however, the validity of 
the distribution of the probabilities can be checked Immediately for each indi- 
vidual's responses to each test item, and invalid responses can be adjusted un- 
til they meet the appropriate criteria. 

The utility of the probabilistic response mode was examined first by com- 
paring the usefulness of different scoring formulas associated with the response 
mode. Then, the factor structure resulting from the probabilistic response mode 
was studied in comparison to the factor structure obtained from scoring the re- 
sponses dichotomously. Also examined in Research Report 83-3 were the validi- 
ties of Che scores obtained from the different scoring methods, their reliabili- 
ties, and the effects of certainty or risk-taking on the probabilistic scores. 

Thompson's (1983) study also involved the administration of items in dif- 
ferent response formats to college students. The study crossed two response 
formats (categorical and probabilistic) with two item types (multiple-choice and 
dichotomous) to obtain four different types of test items. These were (I) the 
conventional multiple-choice item; (2) a probabilistic multiple-choice item, 
similar to that used in Research Report 83-3; (3) a dichotomous (yes, no) item; 
and (4) a dichotomous-probabilistic item in which an examinee answered by stat- 
ing, with a number between 0 and 100, his/her confidence that the answer to the 
question was the correct answer. Similar to Research Report 83-3, Thompson in- 
vestigated several scoring systems for ihe probabilistic items. In addition, 
the four test item types were evaluated in terms of the intercorrelations of the 
scores they provided, their reliabilities, and their factor structures. 

One other factor related to adaptive testing examined in this project con- 
cerned the effects of test administration variables on ability test performance 
and psychological reactions to testing. This study (Research Report 81-2) in- 
vestigated the effects of two variables unique to computer administration. One 
variable was immediate knowledge of results of the correctness of each item re- 



ERIC 



- 7 - 



sponse during the process of test administration. The second variable — pacing 
of item presentation-^as concerned with whether the pace of the test adminis- 
tration was Controlled by the examinee or by the computer* The two variables 
were studied in both computer-administered conventional and adaptive tests* The 
dependent variables included ability test performance (maximum likelihood 6 es- 
timates and proportion correct), response pattern information, item response 
latencies, and psychological reactions to testing. Data were obtained from 477 
college students who were randomly assigned to the experimental conditions. 



Adaptive Achievement Testing 

!• Adaptive intersubtest branching is a feasible approach to improving the 
efficiency of test administration when a test battety is adaptively admin- 
istered. This approach can reduce test battery length by 50% or more with 
no appreciable effect on the psychometric characteristics of scores on the 
tests in the battery (Research Reports 79-6 and 81-4). Although the major 
reductions in test battery length were attributable to adaptive intrasub- 
test item selection, there were additional small reductions in test length 
due to intersubtest branching. Intersubtest branching also resulted in 
test battery information levels that closely approximated those of the full 
test battery, in comparison to information levels obtained solely from the 
use of adaptive intrasubtest item selection (Research Report 81-4). Re- 
sults also indicated (Research Report 81-4) that the order in which sub- 
tests were selected for intersubtest branching had no effect on either the 
efficiency of terjt administration or on the psychometric characteristics of 
the resulting te:»t scores. 

2. The use of change scores to measure changes in achievement over time, which 
assumes that the factor underlying changes in performance is invariant, may 
be appropriate in some achievement testing environments and not in others. 
Results from college courses (Research Report 81-5) indicated that the fac- 
tor structure of measured achievement in a biology course was not the same 
prior to instruction as it was after several weeks of instruction. In a 
mathematics course, however, the factor structure of measured achievement 
did not change over a 10-week period. These results suggest that in the 
absence of information to indicate that the dimensionality of measured 
achievement does not change over time, it is inappropriate to compute sim- 
ple difference scores to measure changes in achievement levels. 

3. There was some indication (Kingsbury, 1984) that Bayesian adaptive tests 
using an individual prior achievement level estimate resulted in more reli- 
able change scores than were obtained from comparable conventional achieve- 
ment tests. Further research is needed, however, to investigate the gener- 
allzability of these findings in other achievement domains. 

4. Adaptive Mastery Testing (AMT) is a viable procedure for reducing cest 
length of mastery tests and improving the efficiency of mastery classifica- 
tions. In monte carlo simulation (Research Report 80-4) AMT achieved the 
best combination of test length reduction and validity of mastery classifi- 
cations in comparison with a sequential probability ratio classification 



Major Findings 



ERIC 




- 8 - 



ERIC 

hfiiinniimrrTiaaia 



procedure and conventional tests scor'^d by proportion correct and IRT-based 
Bayesian scoring* The advantages of AMI were most pronounced in realistic 
item pools in which items varied in difficulties and discriminations. AMT 
also tended to result in a more even balance of false mastery and false 
non-mastery classifications in comparison to the sequential procedure. 

Results from the simulation study were supported in live testing (Research 
Report 81-3). In comparison to conventional achievement tests, both fixed- 
and v&riable-length AMTc> resulted in mastery classifications that were more 
consistent with an independent mastery criterion. The average variable- 
length adaptive test was able to make a high-confidence classification for 
scudents using only from 2 to 5 items, thus reducing test lengths as much 
as 7i\Z to 88% from the 20-item conventional test, with no loss in classifi- 
cation accuracy. 

Adaptive Ability Testing 

5. Owen's Bayesian adaptive testing strategy results in 6 estimates that, un- 
der realistic testing conditions, are biased and not of equal precision 
across 6 levels (Research Report 83-2 and Weiss & McBride, 1984). Only 
under the unrealistic situation in which true 0 was used as the prior 6 did 
Owen's procedure result in unbiased 9 estimates and reasonably horizontal 
information functions. Bias was also differentially affected by item dis- 
criminations for variable-length tests. In addition, for these tests, test 
length was an increasing function of e* The design of these studies al- 
lowed identification of the source of the bias to be the use of a constant 
(group) prior 9 estimate to begin the Bayesian adaptive testing. 

6. Errors in item parameter estimates do not seriously affect the performance 
of adaptive testing strategies (Crichton, 1981; Mattson, 1983). In the 
3-parameter data (Crichton, 1981) using indices combined across 6 levels, 
when error was introduced into the separate item parameter estimates the 
effects were small for errors within the usually observed range* The a and 
b^ parameters generally had similar effects on adaptive test performancl, 
while errors in the £ parameter had negligible effects. When errors in the 
three parameters were combined, effects differed little from the case with 
error in £ or except for very unrealistic levels of error. There were 
no appreciable differences in susceptibility to error among the stradap- 
tive, maximum information, and Bayesian adaptive testing strategies. 

7. When indices conditional on B were examined in the 3-parameter data 
(Crichton, 1981), Bayesiar* and maximum information adaptive tests were 
somewhat less susceptible U errors in item parameter estimates than was 
the stradaptive test. Whereas errors in estimation of the Jb and £ parame- 
ters had little effect on the conditional indices, estimation errors in the 
£ parameter resulted in the major effects on the conditional indices, indi- 
cating that large errors in estimating £ may deteriorate the performance of 
the adaptive testing strategies. Even with this deterioration in perfor- 
mance, however, the adaptive tests still performed better than the conven- 
tional tests for a substantial portion of the 9 range. 

8. When £ and Jb parameter estimates were allowed to correlate with each other. 



12 



- 9 - 



as they do in many real item pools, there was no additional effect on 0 
estimates beyond that due to errors in uncorrelated item parameter esti- 
mates (Mattson, 1983). 

9. Maximum likelihood 8 estimation performed better than Bayesian estimation 
for lesser degrees of error in the item parameters > end Bayesian estimation 
was less affected by item parameter erors for more extreme levels of error, 
particularly for the 1- and 2-parameter models (Mattson, 1983). 

10. The 2-paramete model was least affected by etrors in item parameter esti- 
mates (Mattson, 1983). Under conditions of large errors in item parameter 
estimates, the 2-pararaeter model performed better than the error-free cases 
of 1- and 3-parameter models. 

lU Multidimensionality has a more seriou.i effect on 6 estimates from maximum 
information adaptive tests than does errors in item parameter estimates 
(Suhaldonik & Weiss, 1985). For multidimensional structures with one or 
two factors beyond the first that account for up to one-fourth the variance 
of the first factor, overcoming the effects of multidimensionality would 
require doubling of adaptive test length. The data also suggested that the 
number of factors, and not simply the overall strength of the factor struc- 
ture, affects 6 estimates, since a single factor beyond the first had less 
effect than did two factors that accounted for the same amount of variance. 
In general, however, adaptive testing is quite robust to irultidimensirnal 
structures of the type most frequently resulting from careJul item se ec- 
tion — i.e., factor structures with a strong first factor and second or 
third factors that account for less than one-eighth of the variance of the 
first factor. 

12* Administration of multiple-choice items in a probabilistic response mode 
may be a useful application of computerized test administration. Although 
items answered in a probabilistic mode did not result in higher validities 
than multiple-choice items responded to dichotomously, the probabilistic 
mode resulted in higher reliabilities and a stronger first factor (Research 
Report 83-3). Since the stronger first factor would result in higher IRT 
item discrimination parameters for these items, adaptive testing based on 
items administered probabilistically would likely be more efficient, re- 
sulting in shorter tests or in more precise 6 estimates. Additional analy- 
ses of item formats and response modes (Thompson, 1983) showed that items 
presented in a dichotomous format yielded different factor structures than 
did multiple-choice formats, but supported the higher reliabilities ob- 
served in Research Report 83-3 for the probabilistic response format. 

13« Computerized test administration variables — including adaptive vs. conven- 
tional test t^ ^ , computer- vs. self-paced item administration, and immedi- 
ate knowledge of results after each item is administered — do not have di- 
rect effects on ability test performance, as measured by estimated 6 levels 
(Research Report 81-2). These test administration variables do, however, 
have effects on psychological reactions to testing. Immediate knowledge of 
results appears to have a standardizing effect on test anxiety and test- 
taking motivation, since mean levels of anxiety and motivation were differ- 
ent when knowledge of results was provided bvt similar when it was not. 



ERIC 



13 



- 10 - 



Perceptions of test difficulty were different for adaptive and conventional 
tests; students accurately perceived che conventional tests as either too 
easy or too difficult, depending on their ability levels, while the adap- 
tive test was generally accurately perceived as being of appropriate diffi- 
culty. 



ERIC 



hi 



- 11 - 



ABSTRACTS OF RESEARCH REPORTS 



Research Report 79-6 
Efficiency of an Adaptive Inter-Subtest Branching Strategy 
in the Measurement of Classroom Achievement 
Kathleen A. Gialluca and David J. Weiss 
November 1979 



A real-data simulation was conducted to Investigate the efficiency of an adap- 
tlx'e testing strategy designed for achievement test batteries applied to a 
rlassroom achievement test. This testing strategy combined adaptive item selec- 
tion routines both within and between the subtests of the test battery. Compar- 
isCii'j were made between the conventionally-administered tests and the simulated 
adaptive tests in terms of test length, psychometric information, and correla- 
tion?, of achievement estimates. Design of the study also permitted (1) separa- 
tion of the effects of the adaptive Intra-subtest item selection procedure and 
inter-subtest branching, (2) evaluation of the effects of different intra-sub- 
test termination criteria, (3) use of classical regression equations and regres- 
sion equations corrected for errors of measurement in the predictors, and (4) 
cross-validation stability of the inter-subtest branching regression predic- 
tions. Data consisted of the responses from 1,600 students to classroom-admin- 
istered final exams in a general biology course at the University of Minnesota. 

Total test length was reduced from 16% to 30% using the adaptive intra-sulteciC 
item selection strategy with a variable termination criterion that omits those 
items providing little information to the measurement process* Subtest-length 
reductions ranged from about 8% to 62%. Total test length was reduced another 
1% to 5% (with subtest-length reductions of up to 53%) upon the addition of an 
inter-subtest branching strategy that utilized regression equations witih prior 
information concerning a student *s performance. 

Reductions in subtest length were accomplished with virtually no loss in psycho- 
metric information. Correlations between the Bayeslan achievement estimates 
from the adaptive and conventional tests were uniformly high, typically r^ « *^90 
and higher. Results showed that the use of the corrected regression equations 
did little to Improve the performance of the inter-subtest branching; although 
the multiple correlations for the corrected equations were higher, both the in- 
formation curves and correlations of achievement estimates were generally lower. 
Cross-validation results indicated that the procedure can be used in different 
samples from the same population. 

Results from this study generally supported the generality of this adaptive 
testing strategy for reducing achievement test length with no adverse Impact on 
the quality of the measurements. Suggestions are made for further research with 
this testing strategy. (AD A080956) 



ERIC 




- 12 - 



Research Report 80-4 
A Comparison of Adaptive, Sequential, and Conventional Testing 
Strategies for Mastery Decisions 
G* Gage Kingsbury and David J* Weiss 
November 1980 

Two procedures for making mastery decisions with variable length tests and a 
conventional mastery testing procedure were compared In raonte carlo simulation. 
The simulation varied the characteristics of the Item pool used for testing and 
Che maximum test le-^rth al d. The procedures were compared In terms of the 
mean test length ne. ed to . ce a decision, the validity of the decisions made 
by each procedure, a.id the types of classification errors made by each proce- 
dure. Both of the variable test length procedures were found Lo result In Im- 
portant reductions In mean test length from the conventional test length. The 
Sequential Probability Ratio Test (SPRT) procedure resulted in greater test 
length reductions, on the average, than the Adaptive Mastery Testing (AMT) pro- 
cedure. However, the AMT procedure resulted both in more valid mastery deci- 
sions and in more balanced error rates than the SPRT procedure under all condi- 
tions. In addition, the AMT procedure produced the best combination of test 
length and validity. (AD A09A478) 

Research Report 80-5 
An Alternate-Forms Reliability and Concurrent Validity 
Comparison of Bayeslan Adaptive and Conventional Ability Tests 
G. Gage Kingsbury and David J. Weiss 
December 1980 

Two 30-ltem alternate forms of a conventional test and a Bayeslan adaptive test 
were administered by computer to 472 undergraduate psychology students. In 
addition, each student completed a 120-ltem paper-and-pencll test, which served 
as a concurrent validity criterion test, and a series of very easy questions 
designed to detect students who were not answering conscientiously. All test 
items were five-alternative multiple-choice vocabulary items. Reliability and 
concurrent validity of the two testing strategies were evaluated after the ad- 
ministration of each item for each of the tests, so that trends indicating dif- 
ferences in the testing strategies as a function of test length could be detect- 
ed. For each test, additional analyses were conducted to determine whether the 
two forms of the test were operationally alternate forms. 

Results of the analysis of alternate-forms correspondence indicated that for all 
test lengths greater than 10 items, each of the alternate forms for the two test 
types resulted in fairly constant mean ability level estimates. When the scor- 
ing procedure was equated, the mean ability levels estimated from the two forms 
of the conventional test differed to a greater extent than those estimated from 
the two forms of the Bayeslan adaptive test. 

The alternate-forms reliability analysis Indicated that r.he two forms of the 
Bayeslan test resulted in more reliable scores than the two forms of the conven- 
tional test for all test lengths greater than two items. This result was ob- 
served when the conventional test was scored either by the Bayeslan or propor- 
tion-correct method. 



erJc 



16 



- 13 - 



The concurrent validity analysis showed that the conventional test produced 
ability level estimates that correlated more highly with the criterion test 
scores than did the Bayesian test for all lengths greater than four items. This 
result was observed for both scoring procedures used with the conventional test. 

Limitations of the study, and the conclusions that may be drawn from it, are 
discussed. These limitations, which may have affected the results of this 
study, included possible differences in the alternate forms used within the two 
testing strategies, the relatively small calibration samples used to estimate 
the ICC parameters for the items used in the study, and method variance in the 
conventional tests. (AD A094477) 



The research literature on test theory and methoc for the period 1975 through 
early 1980 is critically reviewed. Research on classical test theory has con- 
centrated on relatively unimportant developments in reliability theory, with 
sume new developments and applications of gener alizability theory appearing dur- 
ing this period. The reliability of change or gain scores has received some 
attention from the classical test theory perspective, as have the applications 
of classical reliability concepts in experimental design and the analysis of 
experimental data. A minor amount of research with classical models was in the 
area of test-score equating. Classical item analysis procedures, however, re- 
ceived little attention. A fair amount of research during the period was devot- 
ed to different item types and test item response modes as replacements for the 
ubiquitous multiple-choice item. Several types of true-false items were pro- 
posed, and formula scoring was studied by a number of researchers in an attempt 
to reduce guessing effects. The perennial topic of response option weighting 
received attention, with efforts oriented toward demonstrating effects on valid- 
ity and reliability. Response modes studied included answer-until-correct , con- 
fidence weighting, and free-response. 

A number of alternatives to classical test theory were studied in an attempt to 
solve some of the problems for which classical test theory has proven to be 
inadequate. Research on criterion-referenced testing continued during this pe- 
riod. Latent trait test theory (item response theory, or IRT) received consid- 
erable attention. Research on the 1-parameter IRT model continued to address 
problems of parameter estimation, model fit, and equating. The question of the 
person-free and sample-free characteristics of this model (i.e., its robustness) 
were investigated, with results generally supporting these desirable character- 
istics. In addition, a special case of this model that can account for guessing 
was developed, and the model was generalized and success. Tully applied to poly- 
chotomous attitude types of items. Considerable research occurred on the 2- and 
3-parameter IRT models. The concept of information as a replacement for classi- 
cal reliability concepts was studied, and its uses in developing parallel te«cs 
were described. As with the 1-parameter IRT model, problems of parameter esti- 
mation and equating were investigated. These IRT models were successfully ap- 



Research Report 81-1 
Review of Test Theory and Methods 
David J. Weiss and Jiark L. Davison 
January 1981 



ERIC 




" 14 " 



plied Co problems of item option weighting and adaptive testing. Important de- 
velopments jiith these models during the period included the demonstration of 
their relationship with other psychological measurement models, and methods for 
determining fit of individuals to IRT models. As another alternative to classi- 
cal test theory, order models were developed and studied, and several other mod- 
els were proposed. 

Validity issues were also studied during this period. A number of approaches to 
the analysis of multitrait-multiraethod matrices were proposed and compared, in- 
cluding some based on structural equations models. Issues of predictive validi- 
ty studied included necessary sample sizes, validity generalization, and modera- 
tor and suppressor effects. Test fairness issues and their effects on validity 
received considerable attention. Concern was with (1) bias in selection; (2) 
fairness to minorities, including differential and single-groups validity and 
comparisons of regression lines, adverse impact, and bias in test content; and 
(3) fairness to women. 

It is concluded that little of consequence was accomplished in classical test 
theory during this period. The most important developments were in alternatives 
to classical test theory, primarily item response theory. Research in this area 
resulted in data and other developments that will permit a better understanding 
of the range of applicability of these models and their potential for solving 
measurement problems not solvable by classical models. (AD A096157) 



Research Report 81-2 
Effects of Immediate Feedback and Pacing of Item Presentation 
on Ability Test Performance and Psychological Reactions to Testinpc 
Marilyn F. Johnson, David J. Weiss, and J. Stephen Prestwood 

February 1981 

The study investigated the joint effects of knowledge of results (KR or no-KR) , 
pacing of icem presentation (computer or self-pacing), and type of testing 
strategy (50-item peaked conventional, variable-length stradaptive, or 50-item 
fixed-length stradaptive test) on ability test performance, test item response 
latency, information, and psychological reactions to testing. The psychological 
reactions to testing were obtained from Likert-type items that assessed test- 
taking anxiety, motivation, perception of difficulty, and reactions to knowledge 
of results. Data were obtained from 447 college students randomly assigned to 
one of the 12 experimental conditions. 

The results indicated that there were no effects on ability estimates due to 
knowledge of results, testing strategy, or pacing of item presentation. Al- 
though average latencies were greater on the stradaptive tests than on the con- 
ventional test, the overall testing time was not substantially longer on the 
adaptive tests and may have been a function of differences in test difficulty. 
Analysis of information values indicated higher levels of information on the 
stradaptive tests than on the conventional test. There was no statistically 
significant main effect for any of the three experimental conditions when test 
anxiety or test-taking motivation were the dependent variables, although there 
were some significant interaction effects. 



13 



- 15 - 



These results indicate that testing conditions may interact in a complex way to 
determine psychological reactions to the tes::ing environment. The interactions 
do suggest, however, a somewhat consistent standardizing effect of KR on test 
anxiety and test-taking motivation. This standardizing effect of KR showed that 
approximately equal levels of motivation and anxiety were reported under the 
various testing conditions when KR was provided, but that mean levels of these 
variables were substantially different when KR was not provided, v^nsistent 
with theoretical expectations, the conventional test was perceived as being 
either too easy or too difficult, whereas the adaptive tests were perceived more 
often as being of appropriate difficulty. 

The results concerning the effects of KR on test performance, motivation, and 
anxiety found in this study were contrary to earlier reported findings; and dif- 
ferences in the studies are delineated. Recommendations are made concerning the 
control of specific testing conditions, such as difficulty of the test and abil- 
ity level of the examinee population, as well as suggestions for the further 
analysis of the standardizing effect of KR. (AD A097688) 



Research Report 81-3 
A Validity Comparison of Adaptive and Conventional 
Strategies for Mastery Testing 
G. Gage Kingsbury and David J. Weiss 
September 1981 

Conventional mastery tests designed to make optimal mastery classifications were 
compared with fixed-length and variable-length adaptive mastery tests in terms 
of validity of decisions with respect to an external criterion measure. Compar- 
isons between Che testing procedures were made across five content areas in an 
introductory biology course from tests administered to over 400 volunteer stu- 
dents. The criterion measure used was the student's final standing in the 
course, based on course examinations and laboratory grades. Results indicated 
that the adaptive test resulted in mastery classifications that were more con- 
sistent with final class standing than those obtained from the conventional 
test. This result was observed within individual content areas and for discrim- 
inant analysis classifications made across content areas. This result was also 
observed for two scoring procedures used with the conventional test (proportion- 
correct and Bayesian scoring). Results also indicated that there was no decre- 
ment in the performance of the adaptive test when a variable termination rule 
was implemented. This variable termination rule resulted in test lengths which 
were, on the average, 74% to 88% shorter than the original adaptive tests. Fur- 
ther analyses explicated the manner in which the adaptive tests administered 
differed fr om the conventional tes t for each content area as a function of 
achievement level. This evidence was used to explain why the adaptive tests 
resulted in more valid decisions than the conventional procedure, in spite of 
the fact that the type of conventional test used here was the most informative 
test concerning the mastery cutoff. It is concluded that variable-length adap- 
tive mastery tests can provide more valid mastery classifications than "optimal" 
conventional mastery tests while reducing test length an average of 80% from the 
length of the conventional tests. (AD A106867) 



ERLC 



19 



- 16 - 



Research Report 81-4 
Factors Influencing the Psychometric Characteristics of an 
Adaptive Testing Strategy for Test Batteries 
Vincent A. Maurelll and David J. Weiss 
November 1981 

A monte carlo simulation was conducted to assess the effects In an adaptive 
testing strategy for test batteries of varying subtest order, subtest termina- 
tion criterion, and variable versus fixed entry on the psychometric properties 
of an existent achievement test battery. Comparisons were made among conven- 
tionally administered tests and adaptive tests using adaptive Intra-subtest item 
selection with and without Inter-subtest branching. Data consisted of responses 
of 300 slmulees to a 201-ltem achievement test battery. Mean test battery 
length was reduced from 42.5% to 52. 3Z using adaptive intra-subtest item selec- 
tion with variable termination. Reductions in mean subtest lengths ranged from 
27% to 67%. When inter-subtest branching was added, additional test length re- 
ductions of 1% to 2% were observed for individual subtests. The reductions in 
test length were achieved with no significant loss of fidelity or psychometric 
information. The addition of inter-subtest branching resulted in levels of mean 
test battery Information more similar to those of the full test battery, even 
with mean test battery reductions of 50% in number of items administered. Sub- 
test order was shown to have no effect on the evaluative criteria employed. The 
results generally supported previous studies of this adaptive testing strategy. 
Suggestions for future research are presented. (AD A109666) 



Research Report 81-5 
Dimensionality of Measured Achievement Over Time 
Kathleen A. Gialluca and David J. Weiss 
December 1981 



Some type of difference or change score is frequently used to quantify the 
effects of experimental treatments and educational programs on individuals and 
on groups of individuals. Whether the change measurement involves the use of 
simple difference scores, their derivatives, or some more complex methodological 
design, the measurement process itself assumes that the treatment or instruction 
results in higher levels of the originally measured variable and that the only 
change that occurs is a quantitative one. If this assumption is not met, then 
the computation of any type of difference score is Inappropriate and the scores 
themselves are useless for measuring growth or change. 

Two studies investigated the tenablllty of the assumption that classroom in- 
struction results in increases in students* achievement levels while the quali- 
tative nature of that achievement remains constant across time. The data util- 
ized were the item responses to tests in basic mathematics and in general biolo- 
gy administered as pretests and after instruction to students enrolled in those 
courses. 



Results indicated that this assumption was not tenable in the biology data set, 
where Increases in mean achievement level were accompanied by corresponding 
changes in the factor structure underlying the item responses. For the mathe- 
matics data, however, there was no such violation of the assumption: As student 



ERJC 



achievement levels increased the underlying factor structure remained unchanged. 
The implications of these results for psychology, education, and program 
evaluation are noted. (AD A110955) 



Research Report 83-2 
Bias and Information of Bayesian Adaptive Testing 
David J. Weiss and James R. McBride 
March 1983 

Monte carlo simulation was used to investigate score bias and information char- 
acteristics of Owen's Bayesian adaptive testing strategy, and to examine possi- 
ble causes of score bias. Factors Investigated in three related studies includ- 
ed effects of an accurate prior 0 estimate, effects of item discrimination, and 
effects of fixed vs. variable test length. Data were generated from a three- 
parameter logistic modal for 3,100 simulees in each of eight data sets; Bayesian 
adaptive tests were administered, drawing items from a '^perfect'* item pool. 
Results showed that the Bayesian adaptive test yielded unbiased 9 estimates and 
relatively flat information functions only in the unrealistic situation in which 
an accurate prior 9 estimate was used-» Whsn a more realistic constant prior e 
estimate was used with a fixed test length, severe bias was observed that varied 
with item discrimination. A different pattern of bias was observed with varia- 
ble test length and a constant prior. Information curves for the constant prior 
conditions generally became more peaked and a^symmetric with increasing item dis- 
crimination. In the variable test length condition the test length required to 
achieve a specified level of the posterior variance of 9 estimates was an 
increasing function of 9 level. These results indicate that 8 estimates from 
Owen's Bayesian adaptive testing method are affected by the prior 9 estimate 
used and that the method does not provide measurements that are unbiased and 
equiprecise except under the unrealistic condition of an accurate prior 9 esti- 
mate. (AD A129280) 



Research Report 83-3 
Effect of Examinee Certainty on Probabilistic Test Scores 
and a Comparison of Scoring Methods for Probabilistic Responses 
Debra Suhadolnik and David J. Weiss 
July 1983 

The present study was an attempt to alleviate some of the difficulties inherent 
in multiple-choice items by having examinees respond to multiple-choice items in 
a probabilistic manner. Using this format, examinees are able to respond to 
each alternative and to provide indications of any partial knowledge they may 
possess concerning the item. The items used in this study were 30 multiple- 
choice analogy items. Examinees were asked to distribute 100 points among the 
four alternatives for each item according to how confident they were that each 
alternative was the correct answer. Each item was scored using five different 
scoring formulas. Three of these scoring formulas — the spherical, quadratic, 
and truncated log scoring methods—were reproducing scoring systems. The fourth 
scoring method used the probability assigned to the correct alternative as the 
item score, and the fifth used a function of the absolute difference between the 
correct response vector for the four alternatives and the actual points assigned 



- 18 - 



Co each alternative as the item score. Total test scores for all of the scoring 
methods were obtained by summing individual item scores. 

Several studies using probabilistic response methods have shown the effect of a 
response-style variable, called certainty or risk taking, on scores obtained 
from probabilistic responses. Results from this study showed a small effect of 
certainty on the probabilistic scores in terms of the validity of the scores but 
no effect at all on the factor structure or internal consistency of the scores. 
Once the effect of certainty on the probabilistic scores had been ruled out, the 
five scoring formulas were compared in terins of validity, reliability, and fac- 
tor structure. There were no differences in the validity of the scores from the 
different methods, but scores obtained from the two scoring formulas that were 
not reproducing scoring systems were more reliable and had stronger first fac- 
tors then the scores obtained using the reproducing scoring systems. For prac- 
tical use, however, the reproducing scoring systems may have an advantage be- 
cause they maximize examinees' scores when examinees respond honestly, while 
honest response3 will not necessarily maximize an examinee's score with the oth- 
er two methods. If a reproducing scoring system is used for this reason, the 
spherical scoring formula is recommended, since it was the most internally con- 
sistent and showed the strongest first factor of the reproducing scoring sys- 
tems. 



ERIC 




- 19 - 



REFERENCES 



Crichcon, L. J. (1981, June). Effect of error In Item parmeter estimates on 

adaptive testing s Unpublished doctoral dissertation. University of Minne- 
sota. 

Glalluca, K. A. , & Weiss, D. J. (1979, November). Efficiency of an adaptive 

Inter-subtest branching strategy In the measurement of classroom achieve- 
ment (Research Report 79-6). Minneapolis: University of Minnesota, De- 
partment of Psychology, Psychometric Methods Program, Computerized Adaptive 
Testing Laboratory. 

Glalluca, K. A. , & Weiss, D. J. (1981, December). Dimensionality of measured 

achievement over time (Research Report 81-5). Minneapolis: University of 
Minnesota, Department of Psychology, Psychometric Methods Program, Comput- 
erized Adaptive Testing Laboratory. 

Johnson, M. F. , Weiss, D. J. , & Prestwood, J. S. (1981, February). Effects of 
Immediate feedback and j?aclng of item presentation on ability test perfor- 
mance and psychological reactions to testing (Research Report 81-2). Min- 
neapolis; University of Minnesota, Department of Psychology, Psychometric 
Methods Program, Computerized Adaptive Testing Laboratory. 

Kingsbury, G. G. (198A, August). Adaptive self-referenced testing as a proce- 
dure for the measurement of individual change due to instruction: A compar- 
ison of the reliabilities of change estimates obtained from adaptive and 
conventional testing procedures . Unpublished doctoral dissertation, Uni- 
versity of Minnesota. 

Kingsbury, G. G. , & Weiss, D. J. (1980a, September). A comparison of ICC-based 
adaptive mastery testing and the Waldlan probability ratio method. In D. 
J. Weiss (Ed.)v Proceedings of the 1979 Computerized Adaptive Testing Con- 
ference (pp. 120-139). Minneapolis: University of Minnesota, Department 
of Psychology, Psychometric Methods Program, Computerized Adaptive Testing 
Laboratory. 

Kingsbury, G. G. ^ & Weiss, D. J. (1980b, November). A comparison of adaptive, 
sequential, and conventional testing strategies for mastery decisions (Re- 
search Report 80-4). Minneapolis: University of Minnesota, Department of 
Psychology, Psychometric Methods Program, Computerized Adaptive Testing 
Laboratory. 

Kingsbury, G. G. , & Weiss, D. J. (1980c, December). An alternate-forms relia- 
bility and concurrent validity comparison of Bayeslan adajtlve and conven- 
tional ability tests (Research Report 80-5). Minneapolis: University of 
Minnesota, Department of Psychology, Psychometric Methods Program, Comput- 
erized Adaptive Testing Laboratory. 

Kingsbury, G. G. , & Weiss, D. J. (1981, September). A validity comparison of 
adaptive and conventional strategies for mastery testing (Research Report 
81-3). Minneapolis: University of Minnesota, Department of Psychology, 



ERIC 



23 



- 20 - 



Psychometric Methods Program, Computerized Adaptive Testing Laboratory. 

Mattson, J. D. (1983, June). Effects of item parameter error and other factors 
on trait estimation in latent-trait-based adaptive testing . Unpublished 
doctoral dissertation, University of Minnesota. 

Maurelli, V. A. , & Weiss, D. J. (1981, November). Factors Influencing^ the psy- 
chometric characteristics of an adaptive testing strategy for test batter- 
ies (Research Report 81-A). Minneapolis: University of Minnesota, Depart- 
ment of Psychology, Psychometric Methods Program, Computerized Adaptive 
Testing Laboratory. 

Suhadolnik, D. , & Weiss, D. J. (1983, July). Effect of examinee certainty on 

probabilistic test scores and a comparison of scoring methods for probabil- 
istic responses (Research Report 83-3). Minneapolis: University of Minne- 
sota, Department of Psychology, Computerized Adaptive Testing Laboratory. 

Suhadolnik, D. , & Weiss, D. J. (1985). Robustness of adaptive testing to multi- 
dimensionality. In D. J. Weiss (Ed.), Proceedings of the 1982 Item Re- 
sponse Theory and Computerized Adaptive Testing Conference (pp. 248-280). 
Minneapolis: University of Minnesota, Department of Psychology, Psychomet- 
ric Methods Program, Computerized Adaptive Testing Laboratory. 

Thompson, J. G. (1983, August). An investigation of the dimensionality of mul- 
tiple-choice and dichotomous vocabulary test items administered in proba- 
bilistic and categorical response formats^ Unpublished Master's thesis, 
University of Minnesota. 

Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive 
testing. Applied Psychological Measurement , 6^, 473-492. 

Weiss, D. J. , & Kingsbury, G. G. (1984). Application of computerized adaptive 
testing to educational problems. Journal of Educational Measurement , 21 , 
361-375. 

Weiss, D. J., & Davison, M. L. (1981, January). Review of test theory and meth- 
ods (Research Report 81-1). Minneapolis: University of Minnesota, Depart- 
ment of Psychology, Psychometric Methods Program, Computerized Adaptive 
Testing Laboratory. 

Weiss, D. J. , & McBride, J. R. (1983, March). Bias and information of Bayesian 
adaptive testing (Research Report 83-2). Minneapolis: University of Min- 
nesota, Department of Psychology, Psychometric Methods Program, Computer- 
ized Adaptive Testing Laboratory. 

Weiss, D. J. , & McBride, J. R. (1984). Bias and information of Bayesian adap- 
tive testing. Applied Psychological Measurement , 8^, 273-285. 



ERJC 24 



Distribution List 



: :r. H^::k fens 
y^ti:5 rt Naval F.Bs^arc:^ 
i:ai%tip C'fsrs, Fa^ East 
H?2 San ?rar::5wC, G "^dS^S 

: It. /;:5<2r.*icr Bc*'^ 

1 Dr. j^^fcert Ir^zu^ 
CMa-^dc. Fl :2Bn 
NAVOP 115 

WasrirxtDn , DC :}:7: 

: Dr. SlarUy Col Iyer 
CHire oi Ma/al Technology 

H. Ojinc^ Street 
Arlington. VA 22217 

I m nikt Curran 
Ofhce of Ndva! Research 
800 N. Quincy St. 
Code 270 

Arlington, VA 22217 

1 Dr. Charles E. DaviE 
Personnel and Training Research 
CHice Naval Research (Code 44:PT) 
SOD North Quincy Street 
Arlingtcn, VA 22217 

1 Dr. John Ellis 
Navy Personnel RiD Center 
San Diego, U 92252 

J DR. PAT FESERICC 
Code P13 

San Diego. CA 92152 

1 Nr. Paul Foley 
Navy Personnel ptg Center 
San DiegOi Cfi 92152 

O 

ERLC 



1 *«5. Kcbecca h^llH 
Xavir Ferssnrsel RID Center tCccs tl' 
San Die::, CA 92152 

\ Kr. Did «2Sh5w 
KAV0F-:15 
AMngtor Sirsx 
Rgo2 2834 

^'ashr.gtcn , s; 2JJ50 

I It. Varaan J. Kerr 

CS.e** J'.-al E3tcat:ai a^^d TraiJ^mg 

Ccic 05S2 
. Savai A;r Static: 

"ensa:ola, Fl 32508 

1 iV. rror^er 
Kavy Personnel RID Center 
3an Diego, 'A 9215! 

; tr, Daryi: Lang 
Navf Personnel BiD Center 
San Diego, CA 92152 

1 Dr. Hilliaa L. Malor ^02^ 
Chief of Hava! Education and Training 
nml Air Station 
FensacoU, FL 32508 

1 Dr. Jases KcSride 
Kavy Personnel RiE Center 
San Diego, CA 92152 

1 It KilliaH Hsntagte 
KPRDC Code \l 
San Diego, CA 92152 

1 Hs, Kathleen Horero 
Navy Personnel Center (Code bl) 
San Diego. CA 92152 

1 Library, Code P20IL 
Navy Personnel RID Center 
San Disgo, CA 92152 

1 Technical Director 
Navy Personnel RiD Center 
San Diego, CA 92152 

h Personnel I Traismg Kesear£^l Group 
Code 442PT 

Office of Naval Research 
A/hngtori, U 22217 

1 Dr, Carl Ross 
CHET-PDCD 
Building 90 

6reat Lakes HTC, IL iOOBB 

23 



I Hr, Dren Sands 
NPRDC rode 62 
San Diego, CA 92 151 

1 hr. ilary Schrat: 
Navy Personnel R!rD Center 
San Diego, CA 92152 

! Dr, Alfred F. Siode 
Senior Scie.ntist 
Code 7B 

Saval Training Etf-ipjient Center 
Orlando. FL 32813 

1 Dr. Richard Snow 
Liaison Scientist 
Office of Naval Researc^ 
Branch Office. London 
Box 39 

F«^0 Nei^ York, Hf 0951? 

1 Dr. Richard Sorensen 
Navy Personnel fi^D Center 
San Diego, CA 92152 

1 Kr, Brad Syipson 
Nav) Personnel RID Center 
San Diego, CA 92152 

! Dr. Frank Vicino 
Kavy Personnel RM Center 
San Diego, CA 92152 

1 Dr. Ronald Weitzaan 
Naval Postgraduate School 
Departsent of Adsinistrative 

Sciences 
Hcnterey, CA 93940 

1 Dr. Douglas Metze! 
Code 12 

Navy Personnel RiD Ceriter 
San Diego, CA 92152 

1 DR. HARTIK F. HISK3FF 
NAVY FERSOHNEL PI D CENTER 
SAN DIESO, CA 92152 

! Hr .John Solfe 
Navy Personnel RiD Center 
San Dieoo, CA 92152 

1 Dr. Mallace Vulfeck. II! 
Navy Personnel R&D Center 
San Diego, CA 92152 

BEST COPY AVAILABLE 



BEST COPY AVAlLABLt.- 



*'.ir;ne Corps 
1 C:!. ^.av Uidich 

m 

KasMnjtci, DC :C3E0 

1 SdecuI Assistant •s'- Har;r.e 
Corps Hatters 
Cade lOOn 

OHice of ?ieval Sesesriih 
30; N. Ourncv St. 
fi-hr.gtcn. 2::;? 

: i-iicr «^'an.^ Ychanr.ari, USMC 
ilishncton, CC 2(38? 

1 Cr, Ksnt Ectcr 
Arsy fitfLcarch Institute 
50?l EzssnhSHer B*.'Q. 
Air andna , VA 2:3:3 

I ?r. ^j-ron F:s:M 
J.S. fi'-Lv SesearcJi Instlt-la fcr tSe 
Sacial and B£?haYio''aI Sciencas 
ECOi Eisenhow Ai^Ciiue 
Alexandria, VA 22333 

1 Dr. Clesseo ^ts'-ti.T 
Amy Research Institute 
5}f! EiserhOner Blvd. 
Akxandria, VA 22333 

I Dr. Karen Hitchell 
Arty R55ecrc^, Institute 
5001 EisenhCfisr 31 yd 
Alexandria, VA 22333 

1 D*-. ;ilhaa £. Ncrdbrcck 
ra-ADCO Bojf 25 
AFB. HY 09710 

1 Dr. Harold F. O'NeU, Jr. 
Director, Training Hesearcfi las 
Aray Research Institute 
5001 EisenhQjier «v£nue 
Ale-4£ridr:a, VA 22Z33 



! Coseander, U.S. Any Restarch Inst ' 
for the Behavioral Social Scii.. 
ATTIi; PESHR IDr, Judith Orasana} 
SGOi Eisenhower Avenue 
Alexandria, VA 22333 

1 Kr. Robert Ross 
U.S. Aray Research Institute for the 
Social and Behavioral Sciences 
500! Eisenhower Avenue 
Alexandria, VA 22333 

1 Dr. Robert Sasaor 
U. S. Ar^y Research Institute fo^ the 
Behavioral and Social Sciences 
5001 Eisenhower Avenue 
Alexandria^ VA 22333 

1 Sr. Joyce Shields 
Arey Rcsea'"Ch Institute for the 
Seha^ioral and Social Sciences 
!0?l Eiserncwer Avenue 
Alexandria, VA 22333 

1 Tr. t^jlda l^ing 
Aray Research Institutes 
5?0I Eisenhower Ave. 
Alexandra, VA 22333 

A:r Force 

1 Dr, Earl A. AHulsi 
HQ, AFHRL (AFSCI 
Brooks A-B, 11 7B235 

I Col. Rcger Caapbell 
AF/HFXCA 

Pentagcfi, Roos 4E1?5 
Mashmcton, DC 20330 

1 Hr. Rayaond £. Chr:stal 
AFHEL/KOE 

Brooks AFB, TX 76235 

I Dr. Alfi-ed R. Freely 
AFGSR/HL 

Boiling AFB, DC 2C332 

1 Or. Patrick Kyllonen 
AFHRL/flOE 

Brooks AFB, II 7B235 

I Dr. Randolph Park 
AFHRL/MOAH 
Brooks A"B, II 78235 



1 Dr. Roger Pennell 
Air Force Vlunn Resources Laboratcrr 
LoKry AFB, CO 80230 

1 Dr« Halcola Ree 

AFHRL/HP 

Brooks AFB, TI 78235 

I »aj. Bill Strickland 
AF/HPIOA 
4E168 Pentagon 
HIashington, DC 20330 

1 Dr. John Tangney 
AFOSR/NL 

Boiling AFB, DC 20332 

I Kajcr John Melsh 
AFHRl/HQAN 

Brooks AFB ♦ TX 78223 

1 Dr. Joseph Yasatuke 
AFKRl/lRT 

LoKry AFB, CO B0230 

Departsent ct Defense 

12 Defense 'echrical Inforaaticn Cei-^ter 
Caseron Station, BIdg 5 
Alexandria, VA 22314 
Attn; IC 

1 Dr. Anita Lancaster 
Accession Pdicy 
OASD/flR/KFiFh/AP 
Pentagon, Rcox 2B271 
Kashmgtcnj DC 203CI 

1 Dr. Jerry Lehnus 
OASD imk) 

Hashington , DC 20301 

1 Dr. Clarence HcCorsick 
HQ, HEPC0.1 
«EPCT-P 

2500 Breen Bay Road 
Hoprth Chicago, IL £0064 

1 Kilitary Assistant for Training and 
Personnel Technology 
Office of the Under Secretary of Defens 
for Research I Engineering 
Rooi 3D129, The Pentagon 
tiashington, GC 20301 



ERLC 



2 b' 



BEST COPY AVAILABLE 



1 nr. Steve SeHisari 
CHice 0^ the Asssstant Srcrctarv 
c{ t:-*ense 'SPA 1 'J 

1 "ebert fi. i^is^er 

ftf3l:n3torM DC 20301 

Iiuha? Age'-.cjss 

I Ir. c;a A. tutir 

1200 .^t*-- St., NJ? 
ts£=ninjtGn, DC 2)209 

: Df. Vern Urry 

rt*:ce cr PerscrriEi «ts:.ac&ient 
I^^CO E S:re5t Vi< 

1 Trcfas A. k=rfc 
y. E, Coast S'jsrcl I^.stitute 
P. C. Sutstaticn !3 
0^3^caa ::ty, OK ^3169 

1 Ir. (Josrpr. L. Vsung, Diraclor 
J^eiory St Cconiti^e Processes 
National Science Fcundation 
SasHingtcn, DC 20550 

ffiiote Sector 

1 Dr. E''!:ng P. Andersen 
Depa'"tcient cf £tai:suc= 
Studiostraeue 6 
1*55 Copenhagen 
DENHfiRK 

1 Dr. Isaac Bejar 
Educat:Qnal Testi*j Service 
Princeton, KJ (-8450 

2 Dr. Kenucha Birenbawa 
school of Education 
m Avw University 

TeS Aviy, Ra^at Aviv i>^97S 
Isras! 

1 'A'erner Birle 
-ersonalstaisaat der Bundeshehr 
t-yM Kceln 9C 
MST 3cR«ANY 



: Cr. Darrell Bock 
Department of Education 
Uriiversity of Chicago 
Chicago, IL i0537 

: Kr, S^^.old Bchref 
Section 0? Fsycholnoical Re55a''£h 
Caserne Petits C^^* 
CRS 

1000 Brussels 
Bel^iuf 

; Dr. Robert Brernan 
American College Testing Prograas 
F, 0. Edx 168 
lo^a City, lA 52243 

1 Lr. Ernest R. Cadotte 
307 3t-:kely 

liniverssty of Tennessee 
KnD.viile, m J791t 

! Sr. Jaaes Carlson 
Aaerican College Testing Frogra^ 
P.O. Box !o8 
lo^a City. lA 52243 

! Dr. Johr, B. CarfoH 
409 Elliott 
Chafe! Hill, KC 2^514 

1 Dr. Horaan Chfl 
Dept. Or Psychsicgy 
Univ. of Sc. California 
Jniversuy Fark 
Los Angeles J Cf 90iD7 

! Dr. Hans Crosbag 
Edication Research Cente'' 
University of Levden 
Boerhaavelaan 2 
2334 EN -eydBfi 
The KETHERLANDS 

1 les Cronbach 
lb laburr.aa Road 
Atne-tcft, CA 94205 

1 CTB/.lcSrah-HiH Library 
2500 Garden Road 
Monterey, CA 9354? 

1 Mr. Tifiothy Daiey 
yniie'-'ity of Ulmhois 
Cepa'-vaent of Educational Psyctiology 
LVbana, IL 61801 



! Dr. Dattpradad Divgi 
Syracuse University 
Departsent of Psychology 
Syracuse, HE 33210 

I Dr, Eaianuel Donchin 
Department of Psyc^40loQy 
University of Illinois 
Chaspaign, IL 61820 

I Dr. Hei-KI Pong 
h»l\ Foundation 
SCO Roosevelt Road 
Building C, Suite 20:> 
Glen Ellyn, IL 601c7 

1 Dr. Fritz DrasgoK 
Departnient of Psychol soy 
University cf Illinois 
603 E. Dame! St. 
Chaspaign, !L 61820 

i Dr. Stephen Dunbar 
Lmdquist Center for ^easurecent 
Lh.versity of Iowa 
!0Ha City. I A 52242 

1 Dr. John K. Eddins 
University of Illinois 
252 Engineering Research Laboratory 
103 South MatheHs Street 
Urbana, IL 61801 

1 Dr. Susan Eabertson 
PSYCHOLOGY DEFARTNENT 
UNIVERSITY DF KANSAS 
LaNrence, KS 66045 

i ERIC Facility-Acquisitions 
4833 Rugby Avenue 
Bethesda^ KB 20014 

1 Sr. Benjaain A. Fairbank, Jr. 
Perforsance Metrics, Inc. 
5825 Callaghan 
Suite 225 

San Antonio, U 78228 

I Dr. Leonard Feldt 
Lindquist Center for Keasurient 
University of lona 
lona City, lA 52242 

i Dr. Richard L. Ferguson 
^he Aaerican College Testing Proaraii 
P.O. Box 168 
lOHa City, I A 52240 



1 Lriv. ^ToK Dr. EerharJ Fische'' 
Lisblggasse S/! 

I ^rolessor Unili Fitzgerald 

Arci'iaU, liih South Kales 23fl 
AUSTRALIA 

! Dr. Drxtsr n^iz^sr 
u.-^ivsrsity Oregci 
DepartflEnt Cc-jfutsr Scjerce 
Ejgene, OR 574:5 

1 Dr. Janice oi'tord 
Ui'»iver3i;y of Hassacnusetts 

I Dr, Robert Eiaser 
Learning Research k De/elspcarit Center 
University of Pittsburgh 
393? O'Hara Street 
t^ITTSBURBH, FA 152in 

1 Dr. Bert Sreeo 
Johns KopJins University 
3e;art»erft of Psycholcgv 
Charles It 34th Street 
Baltifiora, B 2121S 

1 Dipl. Pad. Richie! K. Habcn 
Uraversitat Dusseldorf 
EriiehunGSHissensnaftHchsa 1:.=:. !I 
Universitatsetr. I 
D-4G0& Do53eldor< 1 
SE5T SERKANV 

1 Dr. Ron Ha>ib;eton 
School of Education 
University of tiassachusetts 
A£iherst, HA 01002 

1 Dr. Delwyn Harnisch 
University of Illinois 
51 Berty Drive 
Chaipaign, IL 61820 

1 Prof. Lut2 F. Hornfce 
Uni vers] tat Dusseldorf 
Erziehungsitissenschaltliches Inst. 11 
Universitatsstr. ! 
Dusseldorf 1 
WEST GERKANt 



; Dr* Paul Horst 
477 5 Street, 1184 
Chula Vista, CA 9C010 

1 Dr. Lloyd Hutphreys 
Departient of Psychology 
University of Illincis 
60J East Daniel Street 
Chafoaign, !L i!32v 

1 Dr. Steven Kunka 
Dsp^rtxent of Education 
Unnersity of Alberta 
Edaontan, Alberta 
CANADA 

! Or. Jack Hunter 
2122 CcoHdge St. 
Lansing* HI 46906 

! Dr. HiJ,-h Huynh 
College c! Education 
Onivers5ty of South Carolina 
Coluabia, SC 2920E 

I Dr. Douglas H. Jones 
Advanced Statistical Technologies 
Corporate Off 
10 Trafalgar Court 
LdHrenceville^ 03148 

1 Professor John A. Keats 
Departflent of Psychology 
The University of Hei*castle 
N.S.«. 2308 
AUSTRALIA 

1 Dr. Uilliaa Koch 
University of Texas-Austin 
Heasuresent and Evaluation Center 
Austin. U 78703 

1 Dr. Thosas Leonard 
University of Wisconsin 
Department of Statistics 
1210 Mest Dayton Street 
lladison, H) 53705 

1 Dr. Alan Lesgdd 
Learning R^D Center 
University of Pittsburgh 
393? Q'Hara Street 
Pittsburgh, PA 15260 



1 Dr. Hichael Levine 
Department of Educational Psychclogv 
210 Education Bldg. 
University of Illinois 
Chaipaign, IL 6180! 

1 Dr, Charles Lenis 
Faculteit Sociale Ketenschappen 
Rijksuniverriteit 6roninger 
Oude Boteringestraat 23 
9712GC Bror.ingen 
Netherlands 

1 Dr. Robert Linn 
College of Education 
University of lihnois 
Urbana. IL 61801 

4 Dr. Robert Lockcan 
Center for K&val Ana! /sis 
200 North Beauregard St. 
Alexandria, VA 22311 

1 Dr. Frederic K. Lord 
Educational Testing Service 
Princeton, NJ 0B541 

1 pr, Jaies LuBsden 
Department of Psychology 
University of Western Australia 
Nediands U.A. 6009 
AUSTRALIA 

1 Dr. Sary Harco 
Stop 31-E 

Educational Testing Service 
Princeton, Hi 08451 

1 Hr. Robert HcKinley 
University of Toledo 
Dept of Educational Fsycholcgy 
Toledo, OH 43606 

i Dr< Barbara Heans 
Huian Resources Research Organization 
300 North Washington 
Alexandria, VA 22314 

1 Dr. Robert Hislevy 
Educational Testing Service 
Princeton, HJ 08541 

1 Dr. ij. Alan Hice^ander 
University of Dklahoca 
Departrent d{ Psycholog) 
Oklahoaa City, OK 73069 



ERiC 



23 



BEST COPY AVAILABLE 



BEST COPY AVAILABLE 



1 Br. HeU'in R. Novick 
35i lindquist Center for Msasurfent 
Umversity Df !oh3 
Icwa tit), lA 52242 

1 0r. Jaxes Olson 
MICAT, Inc. 

1875 Scuth State Street 
Ore*, UI 84057 

1 UdvFie n. Patience 
Aserican Council cn Education 
8EB Testing Senice, Suite 20 
One Dupont Cirle, H» 
Hishingtor, DC 20V'36 

i Dr. Jaaes PcUlEor. 
Dept. of P5ych3h9/ 
Port: end State Ur.jvtorsit'r 
PJ. Ed: 7:2 
Port! cud, GR ^7207 

1 Br. .lari D. Peckase 
ACT 

F. 3, Eg\ 168 
lorta CJtv. lA 52243 

1 Dr. Lai^rence Rudner 
403 Els Avenue 
Tcloea Park, m 20012 

I Dr. i3. Ryan 
Departaent g( Education 
University of South Carolina 
Coluftbia, SC 2920B 

1 PROF. FUKIKO SAME3IKA 
DEPT OFPSVCHBLOGV 
UNIVERSITY OF TENNESSEE 
KNDXVJLLE, 7N 379!t 

S Frank L. Schmdt 
Departcent of Psychology 
fldg. 65 

George Hashir.gton University 
'.Washington, DC 2C052 

1 Lowell Schoer 
F'sycha logical i tuantitativ^e 

Fcundaticns 
College ct Education 
University cf Ici^a 
lorfa City, Ifl 52241' 

! D*-. kazuo Shigesasu 
7-9-24 Kugenuaa-Kaigan 
Fujusawa 2S1 
JAPAN 



i Dr. UiUiac 
Center for Kaval Analysis 
100 Xorth Beauregard Street 
i^le^andria, VA 22311 

; Dr. K. Hallace Sinaikc 
Frograr Director 

Manpower Peseefch and Advisor.? Services 
Saithsonian Institution 
801 North Pitt Street 
Aie^ar.dria, VA 22314 

I Hartha Slocking 
Educational Testing Service 
Princeton, NJ 09541 

! Dr. Peter Stoloff 
Center fo*" Naval Analysis 
200 North Beauregard Street 
Alexandria, VA 22311 

i Dr. Killiai Stout 
Univsrsily of Illinois 
Departjent of fiatheoat;cs 
Urbana, IL iieOi 

I Dr. Hariharan Swasinathan 
Laboratory of Psychocetnc and 
Evaluation Research 
School of Education 
University of Hassachusetts 
Aftherst, HA 01003 

1 Dr. KikU£i TatsuoVa 
Coiputer Based Education Research Lab 
252 Engineering Research Laboratory 
Urbana< IL 61801 

1 Dr« Haurice Tatiuol^a 
220 Education Bldq 
1310 S. Suth St. 
Chaapaign. IL 61820 

1 Dr. Da;id Thissen 
Departaent ot Psychoiogv 
University of Kansas 
La/irence/KS 66044 

I Ytr. Gary ThozassDn 
University IHirsOis 
Departfent of Educational Psychology 
Chaipaign, IL 61820 

1 Dr. Robert Tsutakawa 
Departnent of Statistics 
University of »<is50uri 
Coluabia, «0 65201 



1 Dr. Ledyard Tucker 
University of Illinois 
Departaent of Psychology 
603 E. Daniel Street 
Chaiipaign, IL 41820 

! Dr. David Vale 
AssesSAent Systets Corporation 
2233 University Avenue 
Suite 3!0 
St. Paul, KH 55! H 

1 Dr. Howard Kainer 
Division of Psychological Studies 
Educational Testing Service 
Princeton, NJ 08540 

I Dr. Ming-Hei Hang 
Lindquist Center for Measurecent 
University of Iowa 
Iowa City , lA 52242 

I Dr. Brian Maters 
HusRRQ 

300 North Jiashington 
Alexandria, VA 22314 

1 Dr. Rand R. Xilcox 
University of Southern California 
Departaent of Psychology 
Los Angeles, CA 90007 

1 Gercan Military Representative 
ATTN: Hoi f gang Kildegrube 

Streitkraefteast 

0-5300 Bonn 2 
4000 Brandywine Street, 
Washington , DC 20016 

1 Dr. Bruce »iHiass 
Departaent of Educational Psychology 
University of Illinois 
Ur^ana, IL 61801 

1 Ks. Marilyn ){ingersky 
Educational Tenting Service 
Princeton, NJ 0854! 

1 Dr. George iiong 
Biostatistics Laboratory 
Meaorial Sloan-Kettering Cancer Center 
1275 Vork Avenue 
Ken York, NY 10021 

1 Dr. Wendy Ytn 
CTB/KcOraK Hill 
Del Honte Research Park 
Monterey, CA 93940 



29 



Previous Publications 



83-3. 

83-2. 
83-1. 

81-5. 
81-4. 

81-3. 

81-2. 

81-1. 
80-5. 

80-4. 

80-3. 
80-2. 

80-1. 

79-7. 

79-6. 

79-5. 
79-4. 

79-3. 

79-2. 

79-1. 

78-5. 
78-4. 



Proceedings o£ the 1982 Item Response Theory and Computerized Adaptive 

Testing Conference. March 1985. 
Proceedings of the 1979 Computerized Adaptive Testing Conference. 

September 1980 

Proceedings of the 1977 Computerized Adaptive Testing Conference. 

July 1978. ^ 



Effect of Examinee Certainty on Probabilistic Test Scores and a Comparison 

of Scoring Methods for Probabilistic Responses. July 1983. 
Bias and Information of Bayesian Adaptive Testing. March 1983. 
Reliability and Validity of Adaptive and Conventional Tests in a Military 

Recruit Population. January 1983. 
Dimensionality of Measured Achievement Over Time. December 1981. 
Factors Influencing the Psychometric Characteristics of an Adaptive 

Testing Strategy for Test Batteries. November 1981. 
A Validity Comparison of Adaptive and Conventional Strategies for Mastery 

Testing. September 1981. 
Final Report: Computerized Adaptive Ability Testing. April 1981. 
Effects of Immediate Feedback and Pacing of Item Presentation on Ability 

Test Performance and Psychological Reactions to Testing. February 1981. 
Review of Test Theory and Methods. January 1981. 

An Alternate-Forms Reliability and Concurrent Validity Comparison of 
Bayesian Adaptive and Conventional Ability Tests. December 1980. 

A Comparison of Adaptive, Sequential, and Conventional Testing Strategies 
for Mastery Decisions. November 1980. 

Criterion-Related Validity of Adaptive Testing Strategies. June 1980. 

Interactive Computer Administration of a Spatial Reasoning Test. April 
1980. 

Final Report: Computerized Adaptive Performance Evaluation. February 1980. 
Effects of Immediate Knowledge of Results on Achievement Test Performance 

and Test Dimensionality. January 1980. 
The Person Response Curve: Fit of Individuals to Item Characteristic Curve 

Models. December 1979. 
Efficiency of an Adaptive Inter-Subtest Branching Strategy in the 

Measurement of Classroom Achievement. November 1979. 
An Adaptive Testing Strategy for Mastery Decisions. September 1979. 
Effect of Point-in-Time in Instruction on the Measurement of Achievement. 

August 1979. 

Relationships among Achievement Level Estimates from Three Item 

Characteristic Curve Scoring Methods. April 1979. 
Final Report: Bias-Free Computerized Testing. March 1979. 
Effects of Computerized Adaptive Testing on Black and White Students. 

March 1979. i 

Computer Programs for Scoring Test Data with Item Characteristic Curve 

Models. February 1979. f 

An Item Bias Investigation of a Standardized Aptitude Test. December 1978. 

A Construct Validation of Adaptive Achievement Testing. November 1978. on 
Tests of Vocabulary, Mathematics* and Spatial Ability. October 1978. 



Research Reports 



-continued inside 



ERIC 



30 




Previous Publications (continued) 



78-3. 

78-2. 

78-1. 

77-7. 

77-6. 
77-5. 

7 7-A. 
77-3. 
77-2. 

77-1. 

76-5. 
76-4. 

76-3. 

76-2. 
76-1. 

75-6. 
75-5. 

75-4. 

75-3. 
75-2. 

75-1. 

74-5. 
74-4. 
74-3. 

74-2. 
74-1. 
73-4. 

73-3. 
73-2. 
73-1. 



A Comparison of Levels and Dimensions of Performance in Black and White 
Groups 

The Effects of Knowledge of Results and Test Difficulty on Ability Test 

Performance and Psychological Reactions to Testing. September 1978. 
A Comparison of the Fairness of Adaptive and Conventional Testing 

Strategies. August 1978. 
An Information Comparison of Conventional and Adaptive Tests in the 

Measurement of Classroom Achievement. October 1977. 
An Adaptive Testing Strategy for Achievement Test Batteries. October 1977. 
Calibration of an Item Pool for the Adaptive Measurement of Achievement. 

September 1977. 

A Rapid Item-Search Procedure for Bayesian Adaptive Testing. May 1977. 
Accuracy of Perceived Test-Item Difficulties. May 1977. 

A Comparison of Information Functions of Multiple-Choice and Free-Response 

Vocabulary Items. April 1977. 
Applications of Computerized Adaptive Testing. March 1977. 
Final Report: Computerized Ability Testing, 1972-1975. April 1976. 
Effects of Item Characteristics on Test Fairness. December 1976. 
Psychological Effects of Immediate Knowledge of Results and Adaptive 

Ability Testing. June 1976. 
Effects of Immediate Knowledge of Results and Adaptive Testing on Ability 

Test Performance. June 1976. 
Effects of Time Limits on Test-Taking Behavior. April 1976. 
Some Properties of a Bayesian Adaptive Ability Testing Strategy. March 

1976. 

A Simulation Study of Stradaptive Ability Testing. December 1975. 
Computerized Adaptive Trait Measurement: Problems and Prospects* November 



A Study of Computer-Administered Stradaptive Ability Testing. October 
i975. 

Empirical and Simulation Studies of Flexilevel Ability Testing. July 1975. 
TETREST: A FORTRAN IV Program for Calculating Tetrachoric Correlations. 
March 1975. 

An Empirical Comparison of Two-Stage and Pyramidal Adaptive Ability 

Testing. February 1975. 
Strategies of Adaptive Ability Measurement* December 1974. 
Simulation Studies of Two-Stage Ability Testing* October 1974. 
An Empirical Investigation of Computer-Administered Pyramidal Abixity 

Testing. July 1974. 
A Word Knowledge Item Pool for Adaptive Ability Measurement. June 1974. 
A Computer Software System for Adaptive Ability Measurement* January 1974. 
An Empirical Study of Computer-Administered Two-Stage Ability Testing. 

October 1973. 

The Stratified Adaptive Computerized Ability Test* September 1973. 
Comparison of Four Empirical Item Scoring Procedures* August 1973. 
Ability Measurement: Conventional or Adaptive? February 1973* 

Copies of these reports are available, while supplies last, from: 



Computerized Adaptive Testing Laboratory 
N660 Elliott Hall 
University of Minnesota 
75 East River Road 
Minneapolis MN 55455 U.S.A. 



ERLC 



31 




